The Kolmogorov-Smirnov test and its use for the identification of fireball 

fragmentation 
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We propose an application ol the Kolmogorov-Smirnov test for rapidity distributions of individual 
events in ultrarelativistic heavy ion collisions. The test is particularly suitable to recognise non- 
statistical differences between the events. Thus when applied to a narrow centrality class it could 
indicate differences between events which would not be expected if all events evolve according to the 
same scenario. In particular, as an example we assume here a possible fragmentation of the fireball 
into smaller pieces at the quark/hadron phase transition. Quantitative studies are performed with 
a Monte Carlo model capable of simulating such a distribution of hadrons. We conclude that the 
Kolmogorov-Smirnov test is a very powerful tool for the identification of the fragmentation process. 

PACS numbers: 02.50.-r, 24.10.Pa, 24.60.Ky, 25.75.Gz 
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I. INTRODUCTION 



The highly excited matter created in ultrarelativis- 
tic nuclear collisions expands very fast. It is commonly 
accepted that a deconfined phase has been reached in 

Au+Au collisions at RHIC HtSHH, while the onset of 
deconfinement has been advocated at SPS energies 

While in lattice QCD calculations a static thermody- 
namic medium is assumed, in heavy ion collisions the 
situation is vastly different. Here, the longitudinal ex- 
pansion dynamics leads to a rapid passage from the de- 
confined to the confined phase. A system which under- 
goes the phase transition quickly may not follow the usual 
equilibrium scenario. In fact, for a first-order phase tran- 
sition, the high temperature phase may survive down to 
temperatures drastically below the transition tempera- 
ture, i.e. the system supercools. If the expansion rate 
is faster than the nucleation rate of bubbles of the new 
phase, the system reaches the point of spinodal instabil- 
ity 1 . Beyond such a point, entropy is gained if the system 
separates into two phases and so it becomes mechanically 
unstable. Spinodal fragmentation connected with nuclear 
liquid/gas phase transition has been identified in heavy 
ion collisions at few hundred MeV per nucleon @, H[ , and 
it has been proposed that it might be the actual scenario 
at ultrarelativistic energies as well 0, [To| . Fragmentation 
assumes that at the phase transition the system decays 
into droplets of smaller size. These droplets then emit 



hadrons. 

Lattice calculations indicate, however, that at RHIC 
and LHC the transition from partonic to hadronic mat- 
ter is a rapid but smooth crossover [ill ]. Thus spin- 
odal decomposition seems irrelevant scenario in this case. 
On the other hand, conformal symmetry is broken close 
to the phase transition and as a consequence the bulk 
viscosity — being negligible otherwise — shows a peak here 
O [HI, [l4j|. Bulk viscosity acts against the expansion 
and slows it down. As a result, if the system previ- 
ously accumulated kinetic energy due to expansion, it 
may fragment [l4j . An analysis of hydrodynamic insta- 
bilities shows that such a scenario may be realistic [TBI ] . 

Hence, it appears that fragmentation may happen in 
ultrarelativistic nuclear collisions. Many kinds of obscrv- 
ables might be sensitive to it. Most notable are multiplic- 
ity fluctuations in varying rapidity windo ws 1161 1 , fluctua- 
tions of mean p t [l7| , rapidity correlations 
and kaon correlations LL 
two-pion femtoscopy [Tj 



1 This is the inflection point of the dependence of entropy on an 
extensive variable, see e.g. 0,@]- 



, proton 

meson production [181 ] , and 

In this paper we inspect event-by-event fluctuations of 
rapidity distributions. If final state hadrons are emitted 
from droplets, their velocities will be close to those of 
the droplets. Thus, clustering would appear in their mo- 
mentum distribution. Moreover, in each event clusters 
will have different velocities. An important contribution 
to clustering will also come from the resonance decays 
and we shall investigate this effect. Thus, the momen- 
tum distribution will vary from event to event. Specif- 
ically, here we shall compare rapidity distributions from 
different events and look for differences due to fireball 
fragmentation. To this end, we employ the Kolmogorov- 
Smirnov (KS) test which can be used for identification 
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FIG. 1: Construction of the two empirical cummulative dis- 
tribution functions, one with thin solid (blue) line and one 
with thick dashed (red) line. The maximum distance between 
them is D. 



of non-statistical differences between two empirical dis- 
tributions [13, [H| ■ An important advantage of the KS 
test is its independence from the underlying distribution 
of the measured quantity. 

In the next Section we shortly introduce the KS test. 
Then, we illustrate its sensitivity by a couple of toy sim- 
ulations. For more realistic studies we generate artificial 
data with the help of the event generator DRAGON [24[ 
which is very briefly introduced in Section IIVI Results 
obtained with these data are presented in Section [Vl We 
conclude in Section IVII1 In the Appendix we review the 
evaluation of the cummulative distribution function for 
the Kolmogorov distribution. 

II. THE KOLMOGOROV-SMIRNOV TEST AND 
HOW TO USE IT 

Let us start by explaining the technical part of the 
problem. One has two empirical distributions in variable 
x, which can be rapidity, pt, or yet something else 2 . (In 
the present work we work with rapidities.) The multiplic- 
ities may differ. The question we want to ask is, whether 
the two empirical distributions are the same in the sense 
that they would correspond to the same underlying the- 
oretical single-particle probability density, and there are 
no correlations between particles in one event. 

Practically, the quantity x is measured for each par- 
ticle in an event. The empirical cummulative distribu- 
tion function (ECDF) is constructed so that a step of 
the height l/n^ (n.; is the multiplicity of the event) is 
made on all positions of measured x's (Fig. [T]). This is 
done for both events of a pair. Subsequently, one finds 
the maximum vertical distance between the two ECDF's 



Note that for a cyclic variable, e.g. azimuthal angle 0, the 
KS test cannot be ap plied and instead a modification known 
as Kuiper test 25, 26]must be employed. 



and introduces 

V + n 2 w 

where D is the distance of two ECDF's and n\ , n 2 are the 
multiplicities of the two data sets. The procedure of the 
test is illustrated in Figure [T] The cummulative distribu- 
tion function of the Kolmogorov distribution concerns the 
case of events generated from the same underlying theo- 
retical probability distribution for the quantity x, p(x). 
It will be expressed with the help of the function Q(d) as 

P(d' < d) = 1 - Q{d) . (2) 

where P(d' < d) is the probability that we find a dif- 
ference d! smaller than d. An important feature of this 
approach is that Q(d) does not depend on the particu- 
lar shape of the theoretical distribution p(x). Unfortu- 
nately, the general form of Q(d) valid for any multiplic- 
ities and distances d is not suitable for practical evalua- 
tion. Usable approximate expressions for Q(d) are sum- 
marised in the Appendix. It follows that if many pairs of 
events would be drawn from a set of events generated all 
from the same underlying probability distribution and 
for each pair the quantities d and Q(d) would be de- 
termined, then the Q's would be distributed uniformly. 
Deviation from uniform distribution, particularly an en- 
hanced population of low Q's (large d's) indicates that 
the events are not drawn independently from the same 
underlying distribution. In this way the KS test will be 
used here. 

Note that the KS test docs not identify the physical 
origin of the difference between the events. It is a robust 
way to identify that there is a difference, however, the 
origin must be singled out by other means. In addition 
to fireball fragmentation, these can be fluctuations of the 
initial state of the fireball evolution, final resonance de- 
cays, conservation laws, and quantum correlations. A de- 
tailed investigation of these will be pursued in subsequent 
papers. The important message of the KS test is, that 
it can disprove the usual paradigm that data from many 
collisions (within the same centrality class) are produced 
by basically identical fireballs. 

It is important to realise that the number of significant 
decimal figures to which the quantity x (rapidity here) is 
measured may also influence the result of the KS test if 
it is applied on a large number of pairs of events. This is 
illustrated in Figure^] The peak at Q — > 1 increases with 
lowering the number of decimal places taken into account. 
The explanation is trivial but instructive. Only rapidities 
between and 1 were generated. Within 10 5 events, each 
having multiplicity around 200, there are about 2 x I0 7 
particles. Hence, if their values of x are given to less than 
8 figures, we are guaranteed to have repeating values of x 
in our sample. For 6 given figures we expect each value to 
appear on average 20 times, for 4 figures it is 2000 times 
and for 2 figures we are even at 200,000 times! Clearly, 
this effect correlates the events since it artificially chooses 
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FIG. 2: Histograms of Q's from the KS test applied on 10 
pairs out of 10 events generated from a uniform distribution 
between and 1 in the variable x (rapidity) and with multi- 
plicities distributed according to Poisson distribution with the 
mean 200. Different histograms correspond to rapidity data 
truncated after 2, 4, and 6 decimal places. The histogram 
with 8 significant figures is identical to that with 6 figures. 



x from a finite number of possible values. According to 
its construction, the D's will acquire smaller values on 
average. 

We have also checked that the normal fluctuation of the 
number of entries in a bin of the Q-histogram is equal to 
the square root of the number of entries. We thus propose 
the use of quantity 
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where No is the number of pairs in the first bin of the 
Q-histogram (next to Q = 0), N to t is the total number of 
pairs, B is the number of bins of the Q-histogram, and 
o"o = \J Nt t/B is the expected variance of the number of 
entries in the first bin. The modulus of R should be of 
the order 1; values considerably bigger than that indicate 
non-statistical differences between the events. 



FIG. 3: (Color online) Large plot: the Q histograms from 
the KS test on event samples consisting from two classes of 
events. One class was generated from Gaussian distribution 
with the mean and the width a = 0.1. The other class is gen- 
erated from Gaussian distribution with the same width, but 
the mean is shifted by O.lcr. The multiplicities of the events 
are 32 (black dotted histogram), 128 (blue dashed), and 512 
(red solid). Smaller inset plot: the dependence of the parame- 
ter R on the multiplicity of the events for the difference of the 
means equal to O.Olcr (black circles), O.lcr (magenta squares), 
0.5a (blue triangles), and lcr (red upside down triangles). 
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III. THE SENSITIVITY OF THE TEST 

In this section we study how the proposed method 
works in case of clear cut examples. First, we gener- 
ate samples of "events" where one half of all events is 
generated according to Gaussian distribution with the 
width a = 0.1. For the second half of events we keep the 
same width and vary the mean: we have samples with 
the mean shifted with respect to the other half by 2a, 
lcr, 0.5<7, O.Ict, and 0.01c. In Figure [3] we observe how 
the KS procedure recognises the difference of O.lcr pretty 
well if the average multiplicity is 512, and how the res- 
olution power decreases when lowering the multiplicity 



FIG. 4: The Q-histograms from event samples consisting of 
two classes of events where rapidities were generated from 
Gaussian profiles with the same mean and the multiplicity 
was distributed Poissonian with the mean 512. The width in 
one class of events was 0.1. The width in the second class 
has been varied; different histograms correspond to different 
widths. The values of R are written in the legend. 

to 32. As seen in the smaller inset plot, for smaller dis- 
tances between the two Gaussian means the difference is 
not recognised, while for larger distance the difference is 
resolved by the test for all multiplicities. 

In Figure |4] we explore the effect of a variable width. 
One half of events was simulated with Gaussian distribu- 
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FIG. 5: The effect of the number of Gaussian sources on the 
Q-histogram. In the legend, left is the number of Gaussian 
sources distributed uniformly between -1 and 1, right is the 
average number of pions from each source. The width of each 
Gaussian source was 0.707. 



tion with the width of 0.1 and the other one with the same 
mean but a different width. The widths are 1.0, 0.12, 
0.11, 0.101, and 0.09. The multiplicity was Poissonian- 
distributed with the mean of 512. We observe that except 
for the cases where the widths differ by ten per cent or 
less the difference is picked up by the procedure. 

Finally, we test a case which is closest to the fireball 
fragmentation scenario that we want to explore in detail. 
In FigureOwc show the results from a simulation, where 
each event consists from superposition of many Gaus- 
sian distributions. The width of all these distributions 
is 0.707 and is motivated by the typical rapidity spread 
of the pion rapidity at a realistic freeze-out temperature. 
The means of the Gaussians are generated from a uni- 
form distribution between -1 and 1. We test cases with 
16, 32, 64, 128, 256, and 512 Gaussians per event, which 
emit on average 128, 64, 32, 16, 8, 4 particles per Gaus- 
sian, respectively, so that the total multiplicity is always 
2048. We observe, that even in the least favorable simula- 
tion with a large number of small droplets, the difference 
between events is clearly visible. 



IV. MONTE-CARLO DROPLET GENERATOR 

Realistic events samples on which the KS test are ap- 
plied were generated with the help of the Monte Carlo 
event genarator DRAGON [24|. Here we provide very 
brief overview of its capabilities. 

DRAGON assumes that the fireball decays into 
droplets which are distributed according to the blast- 
wave model. Thus their distribution in position and ve- 
locity is given by 

S D (x,v)<xH(n)9(R-r)6(T-T )6W(v-u(x)), (4) 



where we use polar coordinates r and 
rapidity and longitudinal proper time 

1, t + z 
ri = — m 

' 2 t-z 



the space-time 

(5) 
(6) 



as coordinates in the space-time. The fireball has a trans- 
verse radius R and tq is the Bjorkcn proper time of the 
decay. The four-velocity of the droplet v is given by the 
local flow velocity at the position where the droplet is 
created, 

itp(x) = (cosh 7/ cosh^, cos</> sinhr/ t , 

sin</> sinh?y t , sinh^ cosh?7 t ) , (7) 



with 



R 



(8) 



where po is a model parameter. (The model is designed 
so that it can simulate azimuthally non-symmetric fire- 
balls, but we do not explore such a possibility here.) The 
function H(rf) specifies the space-time rapidity distribu- 
tion. It can be uniform or Gaussian. For the present 
investigation we use the uniform distribution in rapidity. 

The volumes of the droplets are random according to 
a gamma distribution 



(9) 



with a model parameter b. The droplets decay into 
hadrons exponentially in time, so the times of emission of 
the droplets are distributed in the rest frame of the emit- 
ting droplet according to exp(— r/iJrj), where Rd is the 
radius of the droplet. A droplet emits hadrons according 
to thermal distribution with a temperature Tfc, until it 
uses up all of its mass. The mass of the droplet is de- 
termined according to its volume and the energy density 
which is set to 0.7 GeV fm~ 3 . 

Hadrons may be emitted from the droplets or produced 
in the remaining space between them. The relative abun- 
dance of those emitted from droplets is specified as a 
model parameter. Hadrons emitted from the bulk are 
generated according to the blast-wave emission function 



S(x,p) d 4 x = m t cosh(y — rf) exp 



x Q(R - r) H{rj) 5(t - r )dr rdrjrdrdcj). (10) 

Here the factor (2s + 1) denotes spin degeneracy. 

Resonances are included in the simulation. They decay 
according to the standard two-body or three-body kine- 
matics. Probabilities of production of individual species 
are given by the statistical model with a chemical freeze- 
out temperature T c h and chemical potentials for baryon 
number and strangeness. 
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V. FLUCTUATING RAPIDITY 
DISTRIBUTIONS 

The Monte Carlo event generator DRAGON is em- 
ployed [ll] to simulate realistic data on which the KS 
test is performed. We use the test on data generated for 
RHIC Au+Au collisions at V« = 200 ^GcV and FAIR 
Au+Au at y/s = 7.6AGcV. For the data analysis we 
accept hadrons within the rapidity interval [-0.5,0.5]. 

For RHIC, we have generated events with uniform ra- 
pidity distribution in the interval [-3,3] . The total hadron 
multiplicity was set to dN / dy = 1000. The chemical com- 
position is determined by the following choice of parame- 
ters: T ch = 155 McV, /iB = 26 MeV We neglect the 
strangeness chemical potential. The list of resonances in- 
cludes mesons up to a mass of 1.5 GeV/c 2 and baryons 
up to 2 GeV/c 2 . The geometry of the decaying fireball is 
given by the radius R ~ 10 fm and tq = 9 fm/c. The dy- 
namical state of the fireball is set by the kinetic freeze-out 
temperature Tk = 150 McV and the transverse expansion 
gradient rjf = 0.6. We set the volume parameter of the 
droplets b to the value of 10 fm 3 . As a first benchmark, 
complementary samples of 10,000 events are generated: 
one with all particles being emitted from droplets, the 
other with all particles being emitted from the bulk fire- 
ball. 

As a second benchmark test we generate 10,000 events 
at the FAIR energy of ^/s = 7.QA GeV where no parti- 
cles are emitted from droplets. In this case the chemical 
freeze-out parameters are set to the corresponding values 
T ch = 140 MeV, /i B = 375 MeV, and ji s = -53 MeV. 
The kinetic freeze-out temperature is Tk = 140 MeV and 
the transverse expansion of the fireball is characterised 
by rjf = 0.4. Here, the rapidity distribution is Gaussian 
with a width of 0.7 and the total hadron multiplicity 
is 1,500. The transverse radius of the fireball and its 
Bjorkcn lifetime arc 9 fm and 8 fm/c, respectively. 

In figure [5] we show the difference between the Q- 
histograms from events with and without droplets. For 
the RHIC energy, one observes a characteristic enhance- 
ment towards small Q values in the case of particle emis- 
sion from droplets for all investigated particle species ex- 
cept (anti-) protons. (For the setting without droplets 
(i.e. only bulk emission) this low Q enhancement is 
strongly suppressed. Quantitatively, this is reflected in 
a factor of 10 difference of the extracted R values. The 
RHIC results without droplet formation are also in line 
with the results obtained at FAIR energies, showing that 
the KS test does not produce falsly positive results when 
going to smaller samples with a different rapidity distri- 
bution. 

Resonance decays also have a clustering effect on the 
decay products. Therefore, a signal of clustering is also 
seen in the set of events without droplets. In case of all 
hadrons, these are mainly p's and A's. If we limit our 
analysis to pions only, then there is correlation due to the 
p. To test this hypothesis, one can perform the KS test 
with protons only, since there is no resonance that would 



decay into two baryons. The drawback of using protons 
only is limited statistics in two ways. Firstly, their total 
multiplicity is lower, e.g. there are only 10 to 50 protons 
in the acceptance per RHIC event. Therefore, one ob- 
serves fewer pairs at small Q and a peak at Q close to 1 
in case there are no droplets (see appendix). Secondly, 
if the droplets are small, protons are a less ideal probe 
because a droplet may not have enough energy to emit 
more than one proton and the correlation is gone then. 
An alternative solution is to use pions of the same charge. 
These are more abundant than protons and no strong ef- 
fect of resonances is seen here. Note, however, that we 
have not included the effect of pair wave function sym- 
metrisation which leads to Bosc-Einstcin correlations. 

Note also, that resonances not only introduce corre- 
lations, they can also weaken the correlations due to 
droplets. Resonance decay products obtain some momen- 
tum due to higher mass of the mother resonance. Thus, 
the velocities of decay products will be more smeared 
around the velocity of the droplet which emitted the res- 
onance than the velocities of hadrons emitted from the 
droplet directly. 

The influence of the size of the droplets is studied in 
Figure [7] The parameters of the simulation are kept the 
same as in the previous case, but the volume parameter 
b varied to values 5, 10, 20, 50 fm 3 . All particles are 
emitted from droplets. We do the KS test with charged 
hadrons. As expected from previous analysis (with b = 
10fm 3 ), a dominant low Q peak emerges for all droplet 
volumes down to 5 fm 3 . From this we conclude that 
even small size droplets can be detected with the analysis 
method presented here. 

As a final physics benchmark of the KS test we ex- 
plore the effect of changing droplet fraction of the to- 
tal multiplicity. A systematic study of how the percent- 
age of hadrons emitted from droplets affects the result is 
presented in Figure [8] Here, the size of the droplets is 
fixed to b = 10 fm 3 and only the percentage of hadrons 
from droplets is varied. Even if only one quarter of all 
hadrons comes from the droplets and the rest from the 
gas in between them, the signal is well visible and the 
KS test can discriminate between the formation and the 
non-formation of droplets. 



VI. RESOLUTION 

Finally, we address the experimentally crucial ques- 
tion whether the droplet signal in the Q-histogram 
stays recognisable if the rapidities are measured with fi- 
nite resolution. We investigate this issue in Figure [9] 
Events simulated with DRAGON for a volume param- 
eter b = 10 fm 3 and with 50 per cent of hadrons emit- 
ted from droplets are taken and the generated rapidities 
are smeared by a Gaussian with widths of 0.1, 0.5, and 
1, to mimick the finite resolution of experimental mea- 
surements. The events with smeared rapidities are then 
processed with the KS test. We observe a gradual weak- 
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FIG. 6: (Color online) The Q-histograms resulting from simulations of realistic hadronic final states with the help of DRAGON. 
Solid (red) histograms correspond to simulation of RHIC Au+Au collisions with droplets. Dashed (blue) histograms are from 
simulations for RHIC without droplets. Dotted (brown) histograms show the results of simulations for nuclear collision at 
FAIR without fragmentation. Different panels show results obtained for all hadrons, charged hadrons, tt + , tv~ , charged pions, 
protons and antiprotons. The values of R are indicated in the panels. 



cning of the signal strength at low Q. For smearing by 
0.5 units of rapidity the peak height becomes comparable 
with the peak resulting from resonance decays only (cf. 
Figure [S]). For even poorer resolution, the peak can not 
be regarded as an unambiguous signal for droplet forma- 
tion. The resolution, however, is usually on the level of 
Ay w 0.1. 



VII. CONCLUSIONS 

The Kolmogorov-Smirnov test is a powerful tool in 
searching for non-statistical differences between events. 
The test itself is more general than investigated here, and 



applications will be presented in following papers. The 
logic of its use is the following: select a class of events 
which are "as identical as possible" , in particular in a 
very narrow centrality class. Conventional scenarios pre- 
dict that each event would evolve according to the same 
scenario and the final distributions of hadrons would be 
identical in all events. The KS test is able to detect de- 
viations from this scenario. If an effect is observed, it 
remains to be studied what phenomenon leads to posi- 
tive results. 

As a currently widely discussed topic we focussed the 
present investigation on the possible decay of the fireball 
into smaller droplets. The present study showed that 
the KS test is perfectly suited for this task and allows 
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FIG. 7: Q-histograms corresponding to various sizes of 
droplets: 6 = 5, 10, 20, 50 fm 3 . The values of R are shown 
in the legend. The Q-histograms are obtained from samples 
of 10,000 events where all hadrons have been emitted from 
droplets. Only charged hadrons have been taken in construct- 
ing these histograms. 
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constructed with charged hadrons only. The values of R are 
listed in the legend. 



to extract a prominent signal. The signal is robust even 
if only a small amount of the hadrons come from the 
droplets and survives realistic final rapidity resolution. 
Thus, the test can be also used in a negative way: if its 
application on data yields only limited or no signature of 
non-statistical event-by-event fluctuations of the rapidity 
distributions, this puts limits on the scenarios assuming 
fireball fragmentation. 

The investigation of the signal of other effects (includ- 
ing a comparison to full Monte Carlo transport simula- 
tion) in the KS test deserves separate studies and shall 



FIG. 9: The influence of finite rapidity resolution. Charged 
hadrons were generated with the Monte Carlo event genarator 
DRAGON with b = 10 fm 3 and 50% of hadrons generated 
from droplets. Before the analysis, rapidities were smeared 
with Gaussian distribution with the width 0.1 (long-dashed 
line), 0.5 (dash-dotted), 1.0 (short-dashed). Solid line shows 
the result with non-smeared data. 



be performed in subsequent papers. 
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APPENDIX A: EVALUATION OF THE 
KOLMOGOROV-SMIRNOV DISTRIBUTION 

Throughout this paper we use the two-sample two- 
sided (Kolmogorov-)Smirnov test 3 . The cummulativc 
distribution function of the difference D for the one- 
sample test in case n — > 00 was derived by Kolmogorov 



The Kolmogorov one-sample test refers to a comparison of one 
empirical cummulativc distribution function based on data with 
a smooth thorctical distribution function. This is distinguished 
from the two-sample (Smirnov) test where two data samples are 
compared with each other. One-sided and two-sided tests refer 
simply to the difference of two cumulative distribution functions 
or to its absolute value, respectively. 
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Q(D)=K (D) = -2^(-l) fc exp(-2fc 2 £> 2 ) . (Al) 
fe=i 

Later, Smirnov [23j proved that the same result applies 
for the two-sample t est for n\, n 2 — > oo under replace- 
ment D —> d = Dy / nin 2 /(n 1 + n 2 ). We have checked 
that such an asymptotic case is not a good approxima- 
tion even for n's around 200. It is therefore desirable to 
obtain formulas valid in non-asymptotic case. 

A simple solution is to replace the quantity d = \JriD 
in eq. (|A1[) with the following one, originally due to 



Stephens [3l[ (found also in Numerical Recipes (32|): 

d = d(^7i + 0.12+^h (A2) 
V y/n J 

A different formula with a few terms of an expansion in 
1 / y/ri has been derived by Li-Chien [H, [34| . In such an 
expansion 

Q(d) = K (d) + K x {d) + K 2 {d) + K 3 (d) + ... . (A3) 

The leading order term has been displayed in eq. (|A1[) . 
The following three terms are 



KM = _^^(-l)^ 2 cxp(-2fc 2 d 2 ) (A4) 

K ^ = ^ D" 1 )* ( fc2 - \ ( X - - ^ ( k2 ~ \ ^ - ("!)*) + 3 ) + 8fc4d4 ) ex P H*^) ( A5 ) 

*.w> = -^B-^ 2 ((^ + f -§(i-(-D fc )) 

_ fc2 d 2 /4 fc2 + 88 _ 2 ^ _ (1)fc ^ + + 8fc4d4 \ (A6) 



These relations were derived for one-sample test. Wc 
checked, however, that they are much easier to han- 
dle and give better results when tested on samples of 
statistically identical events (see below) than approxi- 
mations to two-sample distributions for non-asymptotic 
cases [HI, HI)]. Therefore, we decided to use these rela- 
tions although we note that a revision of the formulae for 
two-sample tests is desirable. 

In practical calculations it turns out that it is sufficient 
to cut off the expansions in eqs. (|A1[) . (|A4| - |A6[) at k = 4. 
To illustrate this point, we compare in Fig. \W\ the his- 
tograms of Q's calculated from eq. (|A2|) with histograms 
based on eq. (|A3|) with the cut-off at k = 2, 3 and 4 re- 
spectively. We show the results for 10 5 pairs of events 
chosen randomly out of 10 5 simulated events. Each sim- 



ulated event is represented by 'rapidities' uniformly gen- 
erated on (0,1) with Poissonian distributed multiplicities 
with mean values of 50, 200 and 1000. Thus we calculate 
Q(d) using the expansion (|A3j) up to the term if 3 and 
evaluate the sums in eqs. (|Al[) . (|A4IIA6j) up to the fourth 
order in k. 

The Li-Chien approximation truncated after k = 4 
may lead to negative Q for d > 1.94. Such a pair would 
fall into the last Q-bin. To fix the problem for this value 
of d an approximation due to Marsaglia (37j is employed 



Q(d) = 2exp 



-2.000071 - 



0.331 1.409 



nd 2 



(A7) 
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